1,167 research outputs found

    Introducing a framework to assess newly created questions with Natural Language Processing

    Full text link
    Statistical models such as those derived from Item Response Theory (IRT) enable the assessment of students on a specific subject, which can be useful for several purposes (e.g., learning path customization, drop-out prediction). However, the questions have to be assessed as well and, although it is possible to estimate with IRT the characteristics of questions that have already been answered by several students, this technique cannot be used on newly generated questions. In this paper, we propose a framework to train and evaluate models for estimating the difficulty and discrimination of newly created Multiple Choice Questions by extracting meaningful features from the text of the question and of the possible choices. We implement one model using this framework and test it on a real-world dataset provided by CloudAcademy, showing that it outperforms previously proposed models, reducing by 6.7% the RMSE for difficulty estimation and by 10.8% the RMSE for discrimination estimation. We also present the results of an ablation study performed to support our features choice and to show the effects of different characteristics of the questions' text on difficulty and discrimination.Comment: Accepted at the International Conference of Artificial Intelligence in Educatio

    Statistical modeling of ground motion relations for seismic hazard analysis

    Full text link
    We introduce a new approach for ground motion relations (GMR) in the probabilistic seismic hazard analysis (PSHA), being influenced by the extreme value theory of mathematical statistics. Therein, we understand a GMR as a random function. We derive mathematically the principle of area-equivalence; wherein two alternative GMRs have an equivalent influence on the hazard if these GMRs have equivalent area functions. This includes local biases. An interpretation of the difference between these GMRs (an actual and a modeled one) as a random component leads to a general overestimation of residual variance and hazard. Beside this, we discuss important aspects of classical approaches and discover discrepancies with the state of the art of stochastics and statistics (model selection and significance, test of distribution assumptions, extreme value statistics). We criticize especially the assumption of logarithmic normally distributed residuals of maxima like the peak ground acceleration (PGA). The natural distribution of its individual random component (equivalent to exp(epsilon_0) of Joyner and Boore 1993) is the generalized extreme value. We show by numerical researches that the actual distribution can be hidden and a wrong distribution assumption can influence the PSHA negatively as the negligence of area equivalence does. Finally, we suggest an estimation concept for GMRs of PSHA with a regression-free variance estimation of the individual random component. We demonstrate the advantages of event-specific GMRs by analyzing data sets from the PEER strong motion database and estimate event-specific GMRs. Therein, the majority of the best models base on an anisotropic point source approach. The residual variance of logarithmized PGA is significantly smaller than in previous models. We validate the estimations for the event with the largest sample by empirical area functions. etc

    Effects of growth rate, size, and light availability on tree survival across life stages: a demographic analysis accounting for missing values and small sample sizes.

    Get PDF
    The data set supporting the results of this article is available in the Dryad repository, http://dx.doi.org/10.5061/dryad.6f4qs. Moustakas, A. and Evans, M. R. (2015) Effects of growth rate, size, and light availability on tree survival across life stages: a demographic analysis accounting for missing values.Plant survival is a key factor in forest dynamics and survival probabilities often vary across life stages. Studies specifically aimed at assessing tree survival are unusual and so data initially designed for other purposes often need to be used; such data are more likely to contain errors than data collected for this specific purpose

    A randomised controlled trial and cost-effectiveness evaluation of "booster" interventions to sustain increases in physical activity in middle-aged adults in deprived urban neighbourhoods

    Get PDF
    Background: Systematic reviews have identified a range of brief interventions which increase physical activity in previously sedentary people. There is an absence of evidence about whether follow up beyond three months can maintain long term physical activity. This study assesses whether it is worth providing motivational interviews, three months after giving initial advice, to those who have become more active. Methods/Design: Study candidates (n = 1500) will initially be given an interactive DVD and receive two telephone follow ups at monthly intervals checking on receipt and use of the DVD. Only those that have increased their physical activity after three months (n = 600) will be randomised into the study. These participants will receive either a "mini booster" (n = 200), "full booster" (n = 200) or no booster (n = 200). The "mini booster" consists of two telephone calls one month apart to discuss physical activity and maintenance strategies. The "full booster" consists of a face-to-face meeting with the facilitator at the same intervals. The purpose of these booster sessions is to help the individual maintain their increase in physical activity. Differences in physical activity, quality of life and costs associated with the booster interventions, will be measured three and nine months from randomisation. The research will be conducted in 20 of the most deprived neighbourhoods in Sheffield, which have large, ethnically diverse populations, high levels of economic deprivation, low levels of physical activity, poorer health and shorter life expectancy. Participants will be recruited through general practices and community groups, as well as by postal invitation, to ensure the participation of minority ethnic groups and those with lower levels of literacy. Sheffield City Council and Primary Care Trust fund a range of facilities and activities to promote physical activity and variations in access to these between neighbourhoods will make it possible to examine whether the effectiveness of the intervention is modified by access to community facilities. A one-year integrated feasibility study will confirm that recruitment targets are achievable based on a 10% sample.Discussion: The choice of study population, study interventions, brief intervention preceding the study, and outcome measure are discussed

    Neurocognitive functioning in acute or early HIV infection

    Get PDF
    We examined neurocognitive functioning among persons with acute or early HIV infection (AEH) and hypothesized that the neurocognitive performance of AEH individuals would be intermediate between HIV seronegatives (HIV−) and those with chronic HIV infection. Comprehensive neurocognitive testing was accomplished with 39 AEH, 63 chronically HIV infected, and 38 HIV− participants. All AEH participants were HIV infected for less than 1 year. Average domain deficit scores were calculated in seven neurocognitive domains. HIV−, AEH, and chronically HIV infected groups were ranked from best (rank of 1) to worst (rank of 3) in each domain. All participants received detailed substance use, neuromedical, and psychiatric evaluations and HIV infected persons provided information on antiretroviral treatment and completed laboratory evaluations including plasma and CSF viral loads. A nonparametric test of ordered alternatives (Page test), and the appropriate nonparametric follow-up test, was used to evaluate level of neuropsychological (NP) functioning across and between groups. The median duration of infection for the AEH group was 16 weeks [interquartile range, IQR: 10.3–40.7] as compared to 4.9 years [2.8–11.1] in the chronic HIV group. A Page test using ranks of average scores in the seven neurocognitive domains showed a significant monotonic trend with the best neurocognitive functioning in the HIV− group (mean rank = 1.43), intermediate neurocognitive functioning in the AEH group (mean rank = 1.71), and the worst in the chronically HIV infected (mean rank = 2.86; L statistic = 94, p < 0.01); however, post-hoc testing comparing neurocognitive impairment of each group against each of the other groups showed that the chronically infected group was significantly different from both the HIV− and AEH groups on neurocognitive performance; the AEH group was statistically indistinguishable from the HIV− group. Regression models among HIV infected participants were unable to identify significant predictors of neurocognitive performance. Neurocognitive functioning was worst among persons with chronic HIV infection. Although a significant monotonic trend existed and patterns of the data suggest the AEH individuals may fall intermediate to HIV− and chronic participants, we were not able to statistically confirm this hypothesis

    Use of mixed methods designs in substance research: a methodological necessity in Nigeria

    Get PDF
    The utility of mixed methods (qualitative and quantitative) is becoming increasingly accepted in health sciences, but substance studies are yet to substantially benefit from such utilities. While there is a growing number of mixed methods alcohol articles concerning developed countries, developing nations are yet to embrace this method. In the Nigerian context, the importance of mixed methods research is yet to be acknowledged. This article therefore, draws on alcohol studies to argue that mixed methods designs will better equip scholars to understand, explore, describe and explain why alcohol consumption and its related problems are increasing in Nigeria. It argues that as motives for consuming alcohol in contemporary Nigeria are multiple, complex and evolving, mixed method approaches that provide multiple pathways for proffering solutions to problems should be embraced

    Development of high-throughput methods to screen disease caused by Rhizoctonia solani AG 2-1 in oilseed rape

    Get PDF
    Background: Rhizoctonia solani (Kühn) is a soil-borne, necrotrophic fungus causing damping off, root rot and stem canker in many cultivated plants worldwide. Oilseed rape (OSR, Brassica napus) is the primary host for anastomosis group (AG) 2-1 of R. solani causing pre- and post-emergence damping-off resulting in death of seedlings and impaired crop establishment. Presently, there are no known resistant OSR genotypes and the main methods for disease control are fungicide seed treatments and cultural practices. The identification of sources of resistance for crop breeding is essential for sustainable management of the disease. However, a high-throughput, reliable screening method for resistance traits is required. The aim of this work was to develop a low cost, rapid screening method for disease phenotyping and identification of resistance traits. Results: Four growth systems were developed and tested: (1) nutrient media plates, (2) compost trays, (3) light expanded clay aggregate (LECA) trays, and (4) a hydroponic pouch and wick system. Seedlings were inoculated with virulent AG 2-1 to cause damping-off disease and grown for a period of 4–10 days. Visual disease assessments were carried out or disease was estimated through image analysis using ImageJ. Conclusion: Inoculation of LECA was the most suitable method for phenotyping disease caused by R. solani AG 2-1 as it enabled the detection of differences in disease severity among OSR genotypes within a short time period whilst allowing measurements to be conducted on whole plants. This system is expected to facilitate identification of resistant germplasm

    MKEM: a Multi-level Knowledge Emergence Model for mining undiscovered public knowledge

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Since Swanson proposed the Undiscovered Public Knowledge (UPK) model, there have been many approaches to uncover UPK by mining the biomedical literature. These earlier works, however, required substantial manual intervention to reduce the number of possible connections and are mainly applied to disease-effect relation. With the advancement in biomedical science, it has become imperative to extract and combine information from multiple disjoint researches, studies and articles to infer new hypotheses and expand knowledge.</p> <p>Methods</p> <p>We propose MKEM, a Multi-level Knowledge Emergence Model, to discover implicit relationships using Natural Language Processing techniques such as Link Grammar and Ontologies such as Unified Medical Language System (UMLS) MetaMap. The contribution of MKEM is as follows: First, we propose a flexible knowledge emergence model to extract implicit relationships across different levels such as molecular level for gene and protein and Phenomic level for disease and treatment. Second, we employ MetaMap for tagging biological concepts. Third, we provide an empirical and systematic approach to discover novel relationships.</p> <p>Results</p> <p>We applied our system on 5000 abstracts downloaded from PubMed database. We performed the performance evaluation as a gold standard is not yet available. Our system performed with a good precision and recall and we generated 24 hypotheses.</p> <p>Conclusions</p> <p>Our experiments show that MKEM is a powerful tool to discover hidden relationships residing in extracted entities that were represented by our Substance-Effect-Process-Disease-Body Part (SEPDB) model. </p
    corecore